Semiparametric analysis of zero-inflated count data.
نویسندگان
چکیده
Medical and public health research often involve the analysis of count data that exhibit a substantially large proportion of zeros, such as the number of heart attacks and the number of days of missed primary activities in a given period. A zero-inflated Poisson regression model, which hypothesizes a two-point heterogeneity in the population characterized by a binary random effect, is generally used to model such data. Subjects are broadly categorized into the low-risk group leading to structural zero counts and high-risk (or normal) group so that the counts can be modeled by a Poisson regression model. The main aim is to identify the explanatory variables that have significant effects on (i) the probability that the subject is from the low-risk group by means of a logistic regression formulation; and (ii) the magnitude of the counts, given that the subject is from the high-risk group by means of a Poisson regression where the effects of the covariates are assumed to be linearly related to the natural logarithm of the mean of the counts. In this article we consider a semiparametric zero-inflated Poisson regression model that postulates a possibly nonlinear relationship between the natural logarithm of the mean of the counts and a particular covariate. A sieve maximum likelihood estimation method is proposed. Asymptotic properties of the proposed sieve maximum likelihood estimators are discussed. Under some mild conditions, the estimators are shown to be asymptotically efficient and normally distributed. Simulation studies were carried out to investigate the performance of the proposed method. For illustration purpose, the method is applied to a data set from a public health survey conducted in Indonesia where the variable of interest is the number of days of missed primary activities due to illness in a 4-week period.
منابع مشابه
Hurdle, Inflated Poisson and Inflated Negative Binomial Regression Models for Analysis of Count Data with Extra Zeros
In this paper, we propose Hurdle regression models for analysing count responses with extra zeros. A method of estimating maximum likelihood is used to estimate model parameters. The application of the proposed model is presented in insurance dataset. In this example, there are many numbers of claims equal to zero is considered that clarify the application of the model with a zero-inflat...
متن کاملSemiparametric analysis of longitudinal zero-inflated count data
Background: The instrumental activities of daily living (IADLs) are important index of physical functioning in older adult studies. These count outcomes with a large proportion of zeros are often collected in longitudinal studies. Data were from the Hispanic Established Population for Epidemiological Study of the Elderly (HEPESE), a four wave (seven years) longitudinal study of community-dwelli...
متن کاملSemiparametric Inference Based on a Class of Zero-Altered Distributions
In modeling count data collected from manufacturing processes, economic series, disease outbreaks and ecological surveys, there are usually a relatively large or small number of zeros compared to positive counts. Such low or high frequencies of zero counts often require the use of under or over dispersed probability models for the underlying data generating mechanism. The commonly used models s...
متن کاملGeneralized partially linear single-index model for zero-inflated count data.
Count data often arise in biomedical studies, while there could be a special feature with excessive zeros in the observed counts. The zero-inflated Poisson model provides a natural approach to accounting for the excessive zero counts. In the semiparametric framework, we propose a generalized partially linear single-index model for the mean of the Poisson component, the probability of zero, or b...
متن کاملBayesian generalized additive models for location, scale and shape for zero-inflated and overdispersed count data
Frequent problems in applied research that prevent the application of the classical Poisson log-linear model for analyzing count data include overdispersion, an excess of zeros compared to the Poisson distribution, correlated responses, as well as complex predictor structures comprising nonlinear effects of continuous covariates, interactions or spatial effects. We propose a general class of Ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Biometrics
دوره 62 4 شماره
صفحات -
تاریخ انتشار 2006